Predictions enable top-down pattern separation in the macaque face-processing hierarchy

Distinguishing faces requires well distinguishable neural activity patterns. Contextual information may separate neural representations, leading to enhanced identity recognition. Here, we use functional magnetic resonance imaging to investigate how predictions derived from contextual information affect the separability of neural activity patterns in the macaque face-processing system, a 3-level processing hierarchy in ventral visual cortex. We find that in the presence of predictions, early stages of this hierarchy exhibit well separable and high-dimensional neural geometries resembling those at the top of the hierarchy. This is accompanied by a systematic shift of tuning properties from higher to lower areas, endowing lower areas with higher-order, invariant representations instead of their feedforward tuning properties. Thus, top-down signals dynamically transform neural representations of faces into separable and high-dimensional neural geometries. Our results provide evidence how predictive context transforms flexible representational spaces to optimally use the computational resources provided by cortical processing hierarchies for better and faster distinction of facial identities.

A. We presented a continuous sequence of face images at a rate of 1 Hz (image duration 500 ms, interstimulus interval 500 ms).The faces were systematically arranged such that one specific image predicted one other image, forming pairs.Hence, the frequency at which the pairs were presented was 0.5 Hz.The pupil is known to entrain to the image frequency (at 1 Hz) and to the pair frequency (at 0.5 Hz) once the pair structure has been learned 1 .B. Example runs showing pupil dilation dynamics to the face-pairs for both monkeys.The plots show averaged normalized pupil area (arbitrary units) aligned to the start of each pair.Shading indicates SEM. C. We found significant phase locking, quantified by intertrial phase coherence (ITC), at the pair frequency (0.5 Hz) in both monkeys (Monkey P: p=3.0561e-05;Monkey L: p=0.0034, one-sided t-test against 1).ITC at the pair frequency increased after learning (p=0.017,unpaired one-sided t-test; data only available for Monkey P).Error bars depict SEM.The face images in panel A were created using FaceGen Modeller.There was no significant correlation between the population magnitude response and patternseparability (vector direction) across areas and conditions (Spearman's rho=0.0167,p=0.9816).B.
Prediction errors (PEs) are often described in terms of the magnitude of neural activity.We found higher response magnitudes when the same faces appeared as a violation of an expectation (red, unexpected condition) than when they were expected (gray) in all face-areas (paired t-tests, all p<3.5e-04, corrected for multiple comparisons).Response magnitudes were also larger in the unexpected than in the contextfree condition (green; unpaired t-tests, all p<1.5e-07, corrected for multiple comparisons).Individual datapoints are population response magnitude per stimulus averaged over the monkeys.Error bars depict SEM.The face images in panel A were created using FaceGen Modeller.weeks for both monkeys.The sequence of pairs was designed such that the transitional probability within a pair was 100% and the transitional probability between pairs was kept at a minimum.B. Task-design for the test phase: In the test phase, monkeys were presented with a sequence of faces in a rapid event-related design during fMRI.Each face image was shown for 500 ms followed by a baseline period and each predictor face was followed by a successor face.The baseline period was temporally jittered to be able to extract single-trial responses from the BOLD signal.Baseline periods were pseudorandomly selected (intra-pair baseline: 1.5, 3.5, or 5.5 s; inter-pair baseline: 5.5, 7.5, or 9.5 s).Trained pairs, i.e., 'expected' successors, were presented in 60% of all trials.Prediction violations were introduced by changing the successor image.The face images were created using FaceGen Modeller.

Figure S1 :
Figure S1: Face-areas localized in both monkeys using functional MRI Significance maps from the contrast: [faces vs. all other categories] are overlaid on coronal slices of anatomical MRIs.All localized face-areas are indicated with a white arrow (AM in the right hemisphere of Monkey P is not shown on this slice).Color bars indicate the negative log p-values of the significance map.Positive values of AP (in mm) indicate anterior from the earbars.

Figure S2 :
Figure S2: Pupil entrainment as a signature of statistical learning of face-pairs.

Figure S3 :
Figure S3: Population response magnitude per condition and face-area.A. Population response magnitudes were quantified to distinguish differences in the magnitude of the IT population response from angular differences in stimulus identity.Population response magnitude and population response vector direction are thought to contain differential information 2,3 .Response magnitude was quantified as the Euclidean (L2) norm across all voxels in a given ROI for each stimulus separately and then averaged.

Figure S4 :
Figure S4: Stimuli and task-schematic.A. Trained Pairs in the statistical learning paradigm: In the training phase of the statistical learning paradigm, the monkeys were exposed to face-pairs to establish associations between the predictor and the successor images.Pairs were arranged such that a predictor face (one identity-view combination) would uniquely predict its specific successor face (one other identity-view combination).Training was conducted with sequentially presented face-pairs for at least 3

Table S1 : Dimensionality (Participation ratio) and the noise ceiling distribution
(synchronized (n= 1000) permutations of stimuli and voxels) per monkey, ROI and condition.

Table S2 : Model-fits in representational similarity analysis
2 nd level similarity between the models: shape, mirror-symmetry and appearance to the experimental 58 1st level RDMs was calculated as Spearman rank correlation while partialling out low-level similarities 4 59 using a Gabor wavelet pyramid as a model of early visual cortex. 2 nd level similarity for the model view- (Fisher-transformed Spearman rank 56 correlation coefficients).57a